Top· Nucleotide Databases · Protein Databases · Sequence Similarity Searching · Sequence Alignment · Human Genome · Other Genomes · Genome-scale Analysis · Protein Domain Families · Motif Finding · Protein 3D Structure · Phylogeny & Taxonomy · Gene Prediction · Gene Expression (including RNA-Seq and Single cell) · Gene Regulation · Biomolecular Networks · Systems Biology · Other Databases · Miscellaneous Tools · Computational Resources · Bioinformatics On-line Courses & Tutorials · Information · Other Lists · SequenceManipulation
Nucleotide Sequence Databases (the principal ones)
- NCBI - National Center for Biotechnology Information
- EBI - European Bioinformatics Institute
- DDBJ - DNA Data Bank of Japan
Protein Sequence Databases
- SWISS-PROT & TrEMBL - Protein sequence database and computer annotated supplement
- UniProt - UniProt (Universal Protein Resource) is the world's most comprehensive catalog of information on proteins. It is a central repository of protein sequence and function created by joining the information contained in Swiss-Prot, TrEMBL, and PIR.
- PIR - Protein Information Resource
- MIPS - Munich Information centre for Protein Sequences
- HUPO - HUman Proteome Organization
Database Searching by Sequence Similarity
- BLAST @ NCBI
- PSI-BLAST @ NCBI
- FASTA @ EBI
- BLAT Jim Kent's Blat is just superb in terms of speed and the integrated view you get for viewing the results
Sequence Alignment
- USC Sequence Alignment Server - align 2 sequences with all possible varieties of dynamic programming
- T-COFFEE - multiple sequence alignment
- ClustalW @ EBI - multiple sequence alignment
- MSA 2.1 - optimal multiple sequence alignment using the Carrillo-Lipman method
- BOXSHADE - pretty printing and shading of multiple alignments
- Splign - Splign is a utility for computing cDNA-to-Genomic, or spliced sequence alignments. At the heart of the program is a global alignment algorithm that specifically accounts for introns and splice signals.
- Spidey - an mRNA-to-genomic alignment program
- Wise2 - align a protein or profile HMM against genomic sequence to predict a gene structure, and related tools
- PipMaker - computes alignments of similar regions in two (long) DNA sequences
- VISTA - align + detect conserved regions in long genomic sequences
- myGodzilla - align a sequence to its ortholog in the human genome
Human Genome Databases
- Draft Human Genome @ NCBI
- Draft Human Genome @ UCSC
- Ensembl - automatically annotated human genome. The DataMining (Mart View) is cool and very useful!
- GDB - Genome Database
- Mammalian Gene Collection - full-length (open reading frame) sequences for human and mouse
- STACK - Sequence Tag Alignment and Consensus Knowledgebase
- GeneCards - human genes, proteins and diseases
Databases of other Organisms
- GOLD - Genomes OnLine Database, information on complete and ongoing genome projects
- TIGR Comprehensive Microbial Resource
- TIGR Microbial Database
- The Proteome Databases - yeast, worm, & human, good annotation
- Saccharomyces Genome Database
- WormBase - C. elegans
- FlyBase
- Berkeley Drosophila Genome Project
- Mouse Genome Informatics
- The Arabidopsis Information Resource
- ZFIN - Zebrafish Information Network
- DictyBase - Dictyostelium discoideum
- EcoGene - E. coli
- HIV sequence database
Genome-wide Analysis
- MBGD - comparative analysis of completely sequenced microbial genomes
- COGs - phylogenetic classification of orthologous proteins from complete genomes
- STRING - detect whether a given query gene occurs repeatedly with certain other genes in potential operons
- Pedant - automatic whole genome annotation
- GeneCensus - various whole genome comparisons
Protein Domains: Databases and Search Tools
- InterPro - integration of Pfam, PRINTS, PROSITE, SWISS-PROT + TrEMBL
- PROSITE - database of protein families and domains
- Pfam - alignments and hidden Markov models covering many common protein domains
- SMART - analysis of domains in proteins
- ProDom - protein domain database
- PRINTS Database - groups of conserved motifs used to characterise protein families
- Blocks - multiply aligned ungapped segments corresponding to the most highly conserved regions of proteins
- Protein Domain Profile Analysis @ BMERC - search a library of profiles with a protein sequence
- TIGRFAMs - yet more protein families based on Hidden Markov Models
Motif and Pattern Search in Sequences
- Gibbs Motif Sampler - identification of conserved motifs in DNA or protein sequences
- AlignACE Homepage - gene regulatory motif finding
- MEME - motif discovery and search in protein and DNA sequences
- SAM - tools for creating and using Hidden Markov Models
- Pratt - discover patterns in unaligned protein sequences
- Motivated Proteins - a web facility for exploring small hydrogen-bonded motifs
Protein 3D Structure
- PDB - protein 3D structure database
- RasMol / Protein Explorer - molecule 3D structure viewers
- SCOP - Structural Classification Of Proteins
- UCL BSM CATH classification
- The DALI Domain Database
- FSSP - fold classification based on structure-structure alignment of proteins
- SWISS-MODEL - homology modeling server
- Structure Prediction Meta-server
- K2 - protein structure alignment
- DALI - 3D structure alignment server
- DSSP - defines secondary structure and solvent exposure from 3D coordinates
- HSSP Database - Homology-derived Secondary Structure of Proteins
- PredictProtein & PHD - predict secondary structure, solvent accessibility, transmembrane helices, and other stuff
- Jpred2 - protein secondary structure prediction
- PSIpred (& MEMSAT & GenTHREADER) - protein secondary structure prediction (& transmembrane helix prediction & tertiary structure prediction by threading)
Phylogeny & Taxonomy
- The Tree of Life
- Species 2000 - index of the world's known species
- TreeBASE - a database of phylogenetic knowledge
- PHYLIP - package of programs for inferring phylogenies
- TreeView - user friendly tree displaying for Macs & Windows
Gene Prediction
- Genscan - eukaryotes
- GeneMark
- Genie - eukaryotes
- GLIMMER - prokaryotes
- tRNAscan - SE 1.1 - search for tRNA genes in genomic sequence
- GFF (General Feature Format) Specification - a standard format for genomic sequence annotation
Gene Expression Databases (including RNA-seq and single cell)
- HuGE - database of human gene expression using arrays
- ExpressDB - yeast and E. coli RNA expression data
- SAGE @ NCBI - Serial Analysis of Gene Expression
- Stanford Microarray Database
- Gene Expression Omnibus (NCBI GEO)
- Expression Atlas - EBI
- BioJupies - Generates RNA-Seq data analysis notebooks New!
- Single Cell Expression Atlas - EBI New!
- Single Cell Portal (Broad Institute) New!
Gene Regulation
- TRANSFAC - database of eukaryotic cis-acting regulatory DNA elements and trans-acting factors
- EPD - eukaryotic promoter database
- DBTSS - DataBase of Transcriptional Start Sites (human)
- SCPD - Saccharomyces cerevisiae promoter database
- DCPD - Drosophila Core Promoter Database
- RegulonDB - a database on transcriptional regulation in E. coli
- DPInteract - protein binding sites on E. coli DNA
- PromoterInspector - prediction of promoter regions in mammalian genomic sequences
- MatInspector - search for transcription factor binding sites
- Cister - cis-element cluster finder
- Gene regulatory Tools
- microRNA.org: microRNA Targets & Expression Profiles
- miRBase
- TarBase Provides a means of searching through a comprehensive set of experimentally supported microRNA targets in at least 8 organisms
- microRNA resource A gateway to all types of information about microRNAs, including articles, products, news, events, and other websites
Metabolic, Gene Regulatory & Signal Transduction Network Databases
- KEGG - Kyoto Encyclopedia of Genes and Genomes
- BioCarta
- DAVID - Database for Annotation, Visualization and Integrated Discovery - A useful server to for annotating microarray and other genetic data.
- stke - Signal Transduction Knowledge Environment
- BIND - Biomolecular Interaction Network Database
- EcoCyc
- WIT
- PathGuide A very useful collection of resources dealing primarily with pathways
- SPAD - Signaling Pathway Database
- CSNDB - Cell Signalling Networks Database
- PathDB
- Transpath
- DIP - Database of Interacting Proteins
- PFBP - Protein Function and Biochemical Networks
- Alliance for Cellular Signalling
Systems Biology
- A glossary of systems biology terms
- A list of institutes specializing in systems biology or related research
- A list of textbooks of interest to systems biologists
- Gene List Annotation Tools (Functional Enrichment)
- DAVID - Database for Annotation, Visualization and Integrated Discovery - A useful server to for annotating microarray and other genetic data.
- MSigDB - Molecular Signatures Database
- ToppGene Suite Gene list functional enrichment and candidate gene prioritization (My Personal favorite)
- Enrichr Gene list functional enrichment - Extensive compilation of data resources (My Personal favorite!)
- Metascape Gene annotation and analysis resource - Excellent output options (My Personal favorite!)
- Panther - Protein ANalysis THrough Evolutionary Relationships
- L2L
- Babelomics (FatiGO+)
- OntoExpress
Other Databases (Annotations, Ontologies, Consortia, etc.)
- Entrez Gene - Gene provides a unified query environment for genes defined by sequence and/or in NCBI's Map Viewer. You can query on names, symbols, accessions, publications, GO terms, chromosome numbers, E.C. numbers, and many other attributes associated with genes and the products they encode. Replaces LocusLink.
- Cancer Genome Anatomy Project
- HUGO's Human Gene Nomenclature
- Gene Ontology Consortium - a controlled vocabulary of eukaryotic gene roles
- Open Biological Ontologies an umbrella web address for well-structured controlled vocabularies for shared use across different biological domains.
- ACUTS - compilation of Ancient Conserved UnTranslated Sequences
- UTR database
- ENZYME - enzyme nomenclature database
- BRENDA - enzyme database
- TC-DB - comprehensive classification of membrane transport proteins
- The SNP Consortium
- HGBASE - database of sequence variations in the human genome
- MethDB - DNA methylation database
- SpliceDB - canonical and non-canonical splice site sequences in mammalian genes
- SpliceOme - database of intron-exon boundaries
- InBase - intein database
- The I.M.A.G.E. Consortium
- The Kabat Database of Sequences of Proteins of Immunological Interest
- Nelson Lab: Cytochrome C
- REBASE - restriction enzyme database
- Chemfinder.com - molecule database
- Genomics Institute of the Novartis Research Foundation
- Mouse SNPs Database- 670,000+ SNP records, 8.0+ million allele calls. Allele tables are provided by investigators or retrieved from public sources. All SNPs are mapped to NCBI Mouse Genome build 33 (C57BL/6J assembly). Most are linked to NCBI dbSNP build 123.
- MetaBase is a user contributed database of databases, listing all the biological databases currently available on the internet.
Miscellaneous Tools
- NCBI Genome Workbench - NCBI Genome Workbench is an integrated application for viewing and analyzing sequence data. With Genome Workbench, you can view data in publically available sequence databases at NCBI, and mix this data with your own private data.
- Morpheus - Analyze gene expression on the cloud (My Personal favorite)
- Repeatmasker - mask repetitive elements in DNA sequences
- Tandem Repeats Finder
- Vienna RNA Package - RNA secondary structure prediction
- mfold (1) - RNA secondary structure prediction
- mfold (2) - RNA secondary structure prediction
- EST parser - find alternative polyadenylation sites in mRNAs, using ESTs
- UTR-extender - extends missing ends of an mRNA using EST and genome sequence data
- CpG Islands - predict CpG islands
- NetStart - prediction of translation start sites in vertebrate and A.thaliana sequences
- ATGpr - prediction of translation start sites in cDNA sequences
- SignalP - secretory signal peptide prediction
- PSORT - prediction of protein sorting signals and transmembrane helices
- CBS Prediction Servers - prediction of protein subcellular localization and various sites in protein and nucleotide sequences
- Compute pI/Mw Tool
- Translate Tool
- Reverse complement nucleotide sequences
- Melting - calculate melting temperature for nucleic acid duplexes
- bend.it - calculate curvature and bendability of a DNA sequence
- webcutter - detect restriction enzyme cutting sites in DNA sequences
- Primer3 - pick primers from a DNA sequence
- Probability Distribution Calculators - normal, chi square, t, F, etc.
Computational Resources
- SourceForge - SourceForge.net is the world's largest Open Source software development website, with the largest repository of Open Source code and applications available on the Internet. SourceForge.net provides free services to Open Source developers.
- W3C - World Wide Web Consortium, definitive reference for HTML and other WWW stuff
- Apache web server documentation
- PHP information
- Web Developer's Virtual Library - encyclopedia of web design tutorials, articles and discussions
- HTML Writers Guild
- CPAN - PERL modules
- bioperl - bioinformatics related PERL modules
- C++ Standard Template Library Programmer's Guide
- C++ Annotations
- Dinkum C Library Reference
- GNU C Library Reference
- C Tutorial
- Java Tutorial
- Numerical Recipes in C and Fortran
- Dictionary of Algorithms, Data Structures, and Problems
- The Linux Cookbook
- Alphabetical Directory of Linux Commands
Bioinformatics on-line course materials and tutorials (not an exhaustive collection)
Intro to bioinformatics and computational biology:
- Introduction to Bioinformatics (Technion - Israel Institute of Technology)
- Introduction to Bioinformatics (UCSD)
- A taste of bioinformatics (University College London)
- Introduction to Computational Molecular Biology (Washington University in St. Louis)
- Introduction to Bioinformatics (UCSD Extension)
- Computational Biology (University of Washington)
- Introduction to Computational Biology (Carnegie Mellon University)
- Introduction to Computational Molecular Biology (MIT)
- Introduction to Computational Molecular Biology: Genome and Protein Sequence Analysis (University of Washington)
Algorithms:
- Algorithms in Computational Biology (Technion -Israel Institute of Technology)
- Algorithms for Molecular Biology (School of Mathematical Sciences at Tel Aviv University)
Miscellaneous:
- Course Era (Great resource!) - Free online courses from top Universities!
- Software Carpentry (Great resource!) - a non-profit volunteer organization whose members teach researchers basic software skills
- Data Carpentry - teaches basic concepts, skills, and tools for working more effectively with data
- Elementary Sequence Analysis (McMaster University)
- Dynamic Programming Tutorial (By Eric C. Rouchka)
- Beginner's Guide to Molecular Biology (Rothamsted Research)
- A Primer on Molecular Genetics (Iowa State University)
- Online Lectures on Bioinformatics (Max Planck Institute for Molecular Genetics)
- EMBnet.org Courses (EMBnet.org)
- DNA and Protein Sequence Analysis (Boston University)
- Computational Molecular Biology (Stanford University)
- Current Topics in Genome Analysis (handouts) (NIH)
- Current Topics in Genomic Analysis (on-line video) (NIH)
- Bioinformatics and Genomic Analysis (University of Arizona)
- Biological Data and Analysis Tools (UCSD)
- Introduction to Structural Bioinformatics (UCSD Extension)
- Perl Programming Course for Bioinformatics and Internet ( Feinberg Graduate School of the Weizmann Institute of Science, Rehovot, Israel)
- Object-Oriented and Database Programming for Bioinformatics and Internet ( Feinberg Graduate School of the Weizmann Institute of Science, Rehovot, Israel)
- Computer Skills For Biologists (UCSD Extension)
- An Intro to R (UMN)
- Statistical Computing with R: A tutorial
- Matlab Tutorial
- Matlab Tutorial (Elementary)
- Microarray Data Analysis
Web Sites for Background Information & News
- NCBI Education - Probably the best starting point for anyone contemplating to switch to Bioinformatics
- NCBI Bookshelf - Includes a number of popular books in electronic format including Genomes by Brown and Human Molecular Genetics by Strachan.
- Train Online
- National Human Genome Research Institute - NIH - Educational Resources
- DOE Human Genome Project Information
- HUGO
- Bioinformatics.org: The Open Lab
- BioNews Bioinformatics Forum
- MIT Biology Hypertextbook
- Online Science and Maths textbooks
- Biochemistry online textbook
- Cell Biology online tutorials
- The Bioinformatics Resource
- The Life Sciences Resource for Schools
- Amino Acid Information
- Worthington Enzyme Manual
- Protein Family Databases
- PROW
- RAMBIOS
- GENE QUANTIFICATION web page
- International Union of Biochemistry and Molecular Biology nomenclature
- Basal Transcription Factors
- On-line Medical Dictionary Medline Plus
- Medical Dictionary Online
- Futurebiojobs
- Funding Opportunities
Other Collections of Bioinformatics Resources
- Nucleic Acids Research (NAR) database list - 2022 New!
- NAR web-server Issue - 2021 New!
- OBRC: Online Bioinformatics Resources Collection
- LabWorm: An aggregator of scientific online tools. Lets you stay updated on the newest and most relevant tools for your research. It is also a crowd voting platform for the scientific community, leveraging the community’s hands-on experience and judgment to vote on the various tools.
- Biostars: An online question & answer resource for the bioinformatics community (My Personal favorite)
- Bioinformatics software and tools: Contains several useful links to bioinformatics databases, and tools
- SoftwareSeek
- GenomeWeb
- Amos' links
- ExPASy Proteomics tools
- Atelier Bioinformatique
- Biology WorkBench
- BCM Search Launcher
- MolBiol.Net
- BMERC
This page was last updated on February 3, 2022 (new resources added; not checked for broken URLs though for the already listed resources).